Predictive value of clinical features and CT radiomics in the efficacy of hip preservation surgery with fibula allograft

Background Despite being an effective treatment for osteonecrosis of the femoral head (ONFH), hip preservation surgery with fibula allograft (HPS&FA) still experiences numerous failures. Developing a prediction model based on clinical and radiomics predictors holds promise for addressing this issue. Methods This study included 112 ONFH patients who underwent HPS&FA and were randomly divided into training and validation cohorts. Clinical data were collected, and clinically significant predictors were identified using univariate and multivariate analyses to develop a clinical prediction model (CPM). Simultaneously, the least absolute shrinkage and selection operator method was employed to select optimal radiomics features from preoperative hip computed tomography images, forming a radiomics prediction model (RPM). Furthermore, to enhance prediction accuracy, a clinical-radiomics prediction model (CRPM) was constructed by integrating all predictors. The predictive performance of the models was evaluated using receiver operating characteristic curve (ROC), area under the curve (AUC), DeLong test, calibration curve, and decision curve analysis. Results Age, Japanese Investigation Committee classification, postoperative use of glucocorticoids or alcohol, and non-weightbearing time were identified as clinical predictors. The AUC of the ROC curve for the CPM was 0.847 in the training cohort and 0.762 in the validation cohort. After incorporating radiomics features, the CRPM showed improved AUC values of 0.875 in the training cohort and 0.918 in the validation cohort. Decision curves demonstrated that the CRPM yielded greater medical benefit across most risk thresholds. Conclusion The CRPM serves as an efficient prediction model for assessing HPS&FA efficacy and holds potential as a personalized perioperative intervention tool to enhance HPS&FA success rates.


Introduction
A refractory orthopedic disorder known as ONFH is thought to cause considerable hip joint dysfunction and potentially disability [1].The femoral head lesion will get worse as the situation worsens, eventually leading to the femoral head collapsing [2].It cannot be repaired after that and needs to be replaced with an artificial joint.Implementing efficient and timely intervention strategies are therefore essential to treating ONFH [3].Currently, almost all surgeons concur that the patient's own joints should be kept as much as possible intact [4,5].
According to the literature, HPS&FA had a significant success rate for hip preservation before artificial joint replacement [6][7][8].However, HPS&FA still faces the following two obstacles.(1) Due to the lack of exact evaluation tools for preoperative patient screening, some ineligible patients will nonetheless have hip preservation failure.(2) There are no accurate methods for predicting the outcome of patients following HPS&FA.
Predictive models can diagnose and prognosticate [9].Currently, it is used to differentiate diseases, screen cases, and predict efficacy [10,11].Several clinical predictors obtained from blood analysis and follow-up data have been assessed to predict the probability of femoral head collapse [12,13].Hip CT is critical for diagnosis of ONFH.Historically, just a few CT imaging features could be subjectively evaluated by physicians.Size, position, and cumulative range of ONFH were difficult to quantify.Radiomics provides a new option for maximizing imaging data [14].Quantitative imaging features that reflect region heterogeneity can be extracted from radiomics.Predicting HPS&FA success may require the development of a predictive model and the screening of meaningful clinical predictors.However, there are no studies on preoperative patient selection and risk prediction models for the failure of hip preservation in HPS&FA.
To address the aforementioned difficulties, we developed a CRPM that combines radiomics features and clinical predictors and evaluated its performance internally.Our objectives are to: (1) identify suitable candidates for HPS&FA, (2) predict postoperative failure risks, and (3) offer relevant perioperative intervention guidance.

Patient selection
According to literature reports, hip preservation failure has been defined as hip replacement within 3 years after HPS, or Harris score < 90 with progressive collapse of the femoral head on imaging.From January 2009 to December 2019, 137 patients (168 hips) with pathologically confirmed ONFH who underwent HPS&FA at our institution were enrolled.The criteria for inclusion and exclusion were as follows: Inclusion criteria (1) All patients got the same surgical intervention (performed by same surgeon); (2) Preoperative CT image data were available; (3) Patients had no history of hip surgery prior to HPS&FA; (4) The duration of follow-up was greater than three years.Exclusion criteria: (1) Insufficient CT picture quality for radiomics analysis; (2) Postoperative cancer, hip tumor, bone tuberculosis, and other malignant disorders.This study used the hip as a unit and comprised 138hips from 112 patients.Then, the entire dataset was randomly divided into a training cohort (n = 96) and a validation cohort (n = 42) with a ratio of 7:3 using computer-generated random numbers [15,16].The Institutional Review Board and Human Ethics Committee approved this retrospective study and waived the requirement to obtain written informed consent.The case selection process is shown in Fig. 1a, and the flowchart of study is shown in Fig. 1b.

Clinical data
General data (age, gender, affected side, disease duration, body mass index (BMI), pathogenic factors), preoperative examination indexes (D-dimer, alkaline phosphatase (ALP), white blood cell count (WBC), neutrophil percentage(N), α-L-fucosidase (AFU)), ARCO stage, JIC classification, and preoperative Harris score were gathered and recorded.Following HPS&FA, patients were followed up to see if they continued use of glucocorticoids or alcohol, non-weightbearing time, and whether they underwent hip replacement.Additionally, postoperative Harris score and X-ray were assessed every 3 months for the first year and then, every 6 months after that.Ultimately, on June 30, 2022, every piece of data was reviewed.

CT radiomics data
The Picture Archiving and Communication (PCAS) system was utilized to acquire images.In addition, all patients underwent CT examinations of the hip utilizing the Philips 128-row Brilliance CT and the GE 64-row LightSpeed VCT.Scan parameters include tube voltage 110-150 kV, tube current 220-680 mA, exposure time 240-800 ms, slice thickness 1.0-3.0mm, slice spacing 1.0-3.0mm, and reconstruction matrix 512 × 512.
We then segmented and extracted features from CT images.Bilinear interpolation was used for resampling, with layer thickness and spacing of 1 mm, imported into 3D Slicer (https:// www.slicer.org, V5.0.2) as NII files.This study focused on femoral head necrosis, defined as fractures of the trabecular bone, texture disorders, sclerosis zones surrounding low-density areas, and cystic degeneration on CT images (sagittal, coronal and horizontal).Reader 1 (Xin Liu) and Reader 2 (Bin Du) outlined this ROI (Fig. 2) in the bone window of CT images.(Hounsfiled Unit (HU)) value was set at 1500HU, window level at 400U.Pyradiomics plugin automatically extracted 851 imaging features.Radiomics features extracted by two readers were evaluated using ICC.Consistency is deemed satisfactory when ICC > 0.75.Reader 1 segmented 30 CT pictures twice within one month to compute intra-observer ICC.Reader 2 segmented selected images separately to calculate inter-observation ICC.We calculated intra-observer and inter-observer ICC.There was no statistically significant difference between Reader 1 and Reader 2. The intra-and inter-observer ICC exceeded 0.75.Thus, both intra-and inter-observer feature extraction exhibited high repeatability.Eventually, all results were based on measurements made by Reader 1.Then, Z-score normalization was applied to guarantee repeatability.

Clinical predictors & Rad-score
Both clinical and CT radiomics data were screened.Using univariate and multivariate analysis, clinical predictors were quickly extracted from 16 clinical data.In addition, the LASSO was utilized to determine the optimal radiomics features among CT radiomics features.The optimal radiomics features with nonzero coefficients were ultimately linearly combined to yield a Rad-score for classification analysis.

Prediction model construction
Clinical data and Rad-score were utilized for modeling by using R statistical software, and then, two prediction models were established, namely CPM and RPM.Subsequently, in order to improve model prediction performance whenever possible, a comprehensive model was developed by integrating clinical predictors with Rad-score as the predictive model for HPS&FA, namely CRPM.At last, using DeLong test, the significance of differences in AUC between models was determined.

Performance assessment of the models
ROC and AUC were plotted to analyze the diagnostic efficacy of model.Then, to visualize the relationship between the variables in the prediction model, a nomogram based on CRPM for individualized efficacy prediction was constructed.Furthermore, a calibration curve was developed to evaluate the calibration utility of nomogram.To evaluate the medical benefit of nomogram under different risk thresholds, DCA was employed.The model was finally validated using the validation cohort.

Statistical analysis
For statistical analysis and model development, SPSS 26.0 and R statistical software (version 1.2.5042) were employed.Using SPSS, both univariate and multivariate analyses were conducted.The R software "glmnet" package was used to perform LASSO Using the "survminer" package for proportional hazards model (COX) survival analysis in order to visualize the relationship between variables and determine the cut-off value.ROC and AUC were then plotted using the "pROC" package, and a nomogram was constructed using the "rms" package.We drew calibration and decision curves for the accuracy and clinical utility of prediction models, respectively, using the "rmda" package.

Clinical characteristics and CT radiomics features
The study enrolled 138 hips (112 patients).Statistically significant differences between training and validation cohorts were not found in 16 clinical data analyses (Table1).From 16 clinical data, we identified four clinical predictors using univariate and multivariate analysis.Specifically, there were significant differences in age (P = 0.020), JIC classification (P = 0.019), postoperative continued use of glucocorticoids or alcohol (P = 0.001), and postoperative complete non-weightbearing time (P = 0.031) between successful and unsuccessful HPS&FA patients (Table 2).
In this study, LASSO was used to reduce dimension and screen radiomics features, as depicted in Fig. 3. Following are the two radiomics characteristics most closely associated with the endpoint of HPS&FA: wavelet-HLH glcm Idmn and original shape Minor Axis Length.In addition, the formula for the radiomics score depends on the weight coefficients of each feature, as shown below: Rad-score = 6.889501813-0.009149794*original_shape_MinorAxisLength-5.302489854*wavelet-HLH_ glcm_Idmncc.

Survival analysis of predictors
In Fig. 4, Kaplan-Meier survival curves showed that mass failures were concentrated in the first 24 months.After 36 months, the femoral head survival rate was stable.HPS&FA success rate was 69.57%, and 46 out of 138 hips failed.There were 26 failures within 12 months, 10 failures between 12 and 24 months, 7 failures between 24 and 36 months, and only three failures after 36 months.A 36-month time endpoint was used for evaluating HPS&FA effectiveness.Moreover, Kaplan-Meier analysis found HPS&FA failures were significantly increased by age greater than 48 years, JIC type C, continued use of glucocorticoids or alcohol postoperatively, complete non-weightbearing time less than 3 months, and Radscores greater than 1.309715.

Model construction and comparison
ROC curves of the prediction models in training and validation cohorts were plotted to identify prediction performance (Fig. 5).AUC of CRPM was greater than that of CPM and RPM in both cohorts.In training cohort, AUC of CRPM was 0.875, the predictive sensitivity was 0.800 and the specificity was 0.864 at the best cut-off point of 0.480.AUC of CRPM in the validation cohort was 0.918, the predictive sensitivity was 0.875 and the specificity was 0.885 at the best cut-off point of 0.117.(Table 3).
DeLong test revealed a significant difference between CRPM and RPM in training cohort (P = 0.004675), but none between CRPM and CPM (P = 0.2224).There was a significant difference between CRPM and CPM in validation cohort (P = 0.03674), but no significant difference between CRPM and RPM (P = 0.3263).According to the combined results of DeLong test, ROC, and AUC, CRPM model that incorporated clinical predictors and Radscore had the superior predictive ability.

Assessment and validation of CRPM
CRPM was visualized as a nomogram (Fig. 6a) to better evaluate the predictors.The nomogram demonstrated that the predicted scores of patients in the training cohort were consistent with clinical reality.Younger age, JIC type B of the femoral head, avoiding continued use of glucocorticoids or alcohol after surgery, complete nonweightbearing time close to 6 months after surgery, and a smaller Rad-score significantly increase the success rate of HPS&FA.The calibration curves of the CRPM training and validation cohorts revealed similar significant agreement between estimation and practical observation (Fig. 6b, c).

Clinical use of CRPM
Figure 7 depicts the DCA of CRPM.If the patient's risk threshold is greater than 5%, CRPM could add more benefit than no treatment option or an all-patient treatment option in most situations.DCA in validation cohort was slightly unsatisfactory, but the trend was similar to that in training cohort, which could bring more net benefits to patients within a wide range of risk thresholds.

Discussion
The CRPM demonstrated superior performance compared to both the RPM and CPM.Using CT radiomics, RPM can provide detailed information on the necrotic area of the femoral head.Instead, CPM evaluates the patient's physical condition, lifestyle, and laboratory tests.As a result of combining the benefits of both models and improving prediction accuracy, the CRPM is an improved model for predicting early postoperative efficacy.In contrast to previous research, we incorporated as many clinical markers as possible.Furthermore, to quantitatively differentiate patients, we evaluated the critical value of predictors in order to significantly improve prediction.Simultaneously, the predictive parameters were incorporated into the nomogram for visualization, facilitating clinical application.HPS&FA nomogram not only predicts effectiveness but also guides perioperative care.With better perioperative supervision, surgeons can choose patients for HPS&FA based on preoperative HPS&FA efficacy was influenced by four clinical predictors included in CRPM.First, glucocorticoids and alcohol use after surgery have been associated with hip preservation failure (P < 0.0001).In clinical practice, glucocorticoids-associated osteonecrosis of the femoral head (GA-ONFH) is most prevalent [17,18].Modern studies have demonstrated that glucocorticoids and alcohol can lead to decreased osteogenic capacity, sparse bone trabeculae, and decreased bone density [19].In addition, it can cause microcirculation disturbances in the femoral head, leading to local metabolic abnormalities that delay or fail bone reconstruction [20,21].Secondly, in order to provide a relatively stable environment for bone regeneration, the affected side must refrain from bearing weight for a period of time following surgery (P < 0.0001) [22,23].Crawling-replacement of bone trabeculae and angiogenesis occur during this period.As bone repair of the femoral head requires 3-6 months, premature weightbearing will put excessive pressure on the femoral head, causing the bone repair process to fail.After three months of non-weightbearing, partial weightbearing is recommended.In our survival analysis, three months was the cut-off value.It is consistent with clinical experience that a shorter period of non-weightbearing time increases failure risk.Third, age is an objective factor (P < 0.00054).As a person ages, osteoclast activity gradually exceeds osteogenic activity due to an imbalance in bone metabolism [24,25].As a result, bone strength    was not prospective, patients had been screened prior to surgery, which may have eliminated some risk factors.

Conclusion
In conclusion, this study developed CRPM, a clinical predictor based on radiomics that could predict the effectiveness of HPS&FA, provide patients screening and personalized perioperative intervention to improve HPS&FA success.In addition, CRPM is clinically practicable and effective and is easy to be popularized and applied after visual nomogram display.

Fig. 2
Fig. 2 Representative CT images for failed a HPS&FA and successful b HPS&FA.c Comparison of basic information and risk factors.Red arrow: necrosis volume was large and involved the lateral column in (a).Green arrow: necrosis volume was relatively small and the lateral column was not accumulated in (b)

Fig. 3
Fig. 3 Procession of LASSO.a Regression coefficient plot, b Cross-validation plot

Fig. 4 Fig. 5
Fig. 4 Kaplan-Meier analyses of the 5 selected features for patients in the training cohort.a Exposure b non-weightbearing c age d JICclassification e Rad-score f The survival rate after HPS&FA is basically stable at 36 months

Fig. 6 a
Fig. 6 a Nomogram of the CRPM for predicting the efficacy of HPS&FA.b, c Calibration curves of CRPM in the training (b) and validation (c) cohorts

Fig. 7
Fig. 7 DCA of CRPM in training (a) and validation (b) cohorts

Table 1
Comparison of clinical data between training cohort and validation cohort predictions (age, JIC classification, CT imaging).Further, postoperative predictors can be utilized to direct postoperative rehabilitation, such as glucocorticoids or alcohol usage.

Table 2
Univariable and multivariable analysis of training cohort

Table 3
Performance evaluation of training and validation cohorts, including specificity sensitivity and 95% confidence interval